Sourcerer: A Search Engine for Open Source Code
نویسندگان
چکیده
sourcerer is a search engine for open source code that extracts fine-grained structural information from the code. This information is used both to implement a basic notion of code rank and to enable search forms that go beyond conventional keyword-based searches. sourcerer supports two types of searches: (1) implementations, and their use; and (2) program structures. Several schemes were compared for ranking the results of code search. Results are reported involving 1,555 open source Java projects, corresponding to 254 thousand classes and 17 million LOCs. Of the schemes compared, the scheme that produced the best search results was one consisting of a combination of (a) the standard TF-IDF technique over Fully Qualified Names (FQNs) of code entities, with (b) a “boosting” factor for terms found towards the right-most handside of FQNs, and (c) a composition with a graph-rank algorithm that identifies popular classes.
منابع مشابه
Sourcerer: An infrastructure for large-scale collection and analysis of open-source code
A large amount of open source code is now available online, presenting a great potential resource for software developers. This has motivated software engineering researchers to develop tools and techniques to allow developers to reap the benefits of these billions of lines of source code available online. However, collecting and analyzing such a large quantity of source code presents a number ...
متن کاملA Study of Ranking Schemes in Internet-Scale Code Search
The large availability of source code on the Internet is enabling the emergence of specialized search engines that retrieve source code in response to a query. The ability to perform search at this scale amplifies some of the problems that also exist when search is performed at single-project level. Specifically, the number of hits can be several orders of magnitude higher, and the variety of c...
متن کاملMining Internet-Scale Software Repositories
Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated crawling, parsing, and database storage of open source software. Sourcerer allows us to gather Internet-scale source code. For instance, in one experiment, we gather 4,632 java projects from SourceForge and Apache totali...
متن کاملIntegrating S6 code search and Code Bubbles
We wanted to provide a tool for doing code search over open source repositories as part of the Code Bubbles integrated development environment. Integrating code search as a plug-in to Code Bubbles required substantial changes to the S6 code search engine and the development of appropriate user interfaces in Code Bubbles. After briefly reviewing Code Bubbles and the S6 search engine, this paper ...
متن کاملNEGWeb: Detecting Neglected Conditions via Mining Programming Rules from Open Source Code
Neglected conditions, also referred as missing paths, are known to be an important class of software defects. Revealing neglected conditions around individual API calls in an application requires the knowledge of programming rules that must be obeyed while reusing those APIs. To mine those implicit programming rules and hence to detect neglected conditions, we develop a novel framework, called ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006